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Abstract 

This paper addresses the problem of recovering relative structure, in the form of an invariant, from two views of a 3D 
scene. The invariant structure is computed without any prior knowledge of camera geometry, or internal calibration, 
and with the property that perspective and orthographic projections are treated alike, namely, the system makes no 
assumption regarding the existence of perspective distortions in the input images. 

We show that, given the location of epipoles, the projective structure invariant can be constructed from only four 
corresponding points projected from four non-coplanar points in space (like in the case of parallel projection). This 
result leads to two algorithms for computing projective structure. The first algorithm requires six corresponding 
points, four of which are assumed to be projected from four coplanar points in space. Alternatively, the second 
algorithm requires eight corresponding points, without assumptions of coplanarity of object points. 
Our study of projective structure is applicable to both structure from motion and visual recognition. We use 
projective structure to re-project the 3D scene from two model images and six or eight corresponding points with a 
novel view of the scene. The re-projection process is well-defined under all cases of central projection, including the 
case of parallel projection. 
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1 Introduction 

The problem we address in this paper is that of recover- 
ing relative, non-metric, structure of a three-dimensional 
scene from two images, taken from different viewing po- 
sitions. The relative structure information is in the form 
of an invariant that can be computed without any prior 
knowledge of camera geometry, and under all central pro- 
jections — including the case of parallel projection. The 
non-metric nature of the invariant allows the cameras to 
be internally uncalibrated (intrinsic parameters of cam- 
era are unknown). The unique nature of the invariant al- 
lows the system to make no assumptions about existence 
of perspective distortions in the input images. Therefore, 
any degree of perspective distortions is allowed, i.e., or- 
thographic and perspective projections are treated alike, 
or in other words, no assumptions are made on the size 
of field of view. 

We envision this study as having applications both in 
the area of structure from motion and in the area of 
visual recognition. In structure from motion our contri- 
bution is an addition to the recent studies of non-metric 
structure from motion pioneered by Koenderink and Van 
Doom (1991) in parallel projection, followed by Faugeras 
(1992) and Mohr, Quan, Veillon k Boufama (1992) for 
reconstructing the projective coordinates of a scene up 
to an unknown projective transformation of 3D projec- 
tive space. Our approach is similar to Koenderink and 
Van Doom's in the sense that we derive an invariant, 
based on a geometric construction, that records the 3D 
structure of the scene as a variation from two fixed ref- 
erence planes measured along the line of sight. Unlike 
Faugeras and Mohr et al. we do not recover the projec- 
tive coordinates of the scene, and, as a result, we use a 
smaller number of corresponding points: in addition to 
the location of epipoles we need only four correspond- 
ing points, coming from four non-coplanar points in the 
scene, whereas Faugeras and Mohr et al. require corre- 
spondences coming from five points in general position. 

The second contribution of our study is to visual recog- 
nition of 3D objects from 2D images. We show that our 
projective invariant can be used to predict novel views of 
the object, given two model views in full correspondence 
and a small number of corresponding points with the 
novel view. The predicted view is then matched against 
the novel input view, and if the two match, then the 
novel view is considered to be an instance of the same ob- 
ject that gave rise to the two model views stored in mem- 
ory. This paradigm of recognition is within the general 
framework of alignment (Fischler and Bolles 1981, Lowe 
1985, Ullman 1989, Huttenlocher and Ullman 1987) and, 
more specifically, of the paradigm proposed by Ullman 
and Basri (1989) that recognition can proceed using only 
2D images, both for representing the model, and when 
matching the model to the input image. We refer to the 
problem of predicting a novel view from a set of model 
views using a limited number of corresponding points, 
as the problem of re-projection. 

The problem of re-projection has been dealt with in 
the past primarily assuming parallel projection (Ull- 
man and Basri 1989, Koenderink and Van Doom 1991). 
For the more general case of central projection, Barret, 



Brill, Haag & Pyton (1991) have recently introduced a 
quadratic invariant based on the fundamental matrix of 
Longuet-Higgins (1981), which is computed from eight 
corresponding points. In Appendix E we show that 
their result is equivalent to intersecting epipolar lines, 
and therefore, is singular for certain viewing transfor- 
mations depending on the viewing geometry between the 
two model views. Our projective invariant is not based 
on an epipolar intersection, but is based directly on the 
relative structure of the object, and does not suffer from 
any singularities, a finding that implies greater stability 
in the presence of errors. 

The projective structure invariant, and the re- 
projection method that follows, is based on an exten- 
sion of Koenderink and Van-Doorn's representation of 
affine structure as an invariant defined with respect to 
a reference plane and a reference point. We start by in- 
troducing an alternative affine invariant, using two ref- 
erence planes (section 5), and it can easily be extended 
to projective space. As a result we obtain a projective 
structure invariant (section 6). 

We show that the difference between the affine and 
projective case lie entirely in the location of the epipoles, 
i.e., given the location of epipoles both the affine and 
projective structures are constructed by linear methods 
using the information captured from four corresponding 
points projected from four non-coplanar points in space. 
In the projective case we need additional corresponding 
points — solely for the purpose of recovering the location 
of the epipoles (Theorem 1, section 6). 

We show that the projective structure invariant can 
be recovered from two views — produced by parallel or 
central projection — and six corresponding points, four 
of which are assumed to be projected from four coplanar 
points in space (section 7.1). Alternatively, the projec- 
tive structure can be recovered from eight corresponding 
points, without assuming coplanarity of object points 
(section 8.1). The 8-point method uses the fundamental 
matrix approach (Longuett-Higgins, 1981) for recover- 
ing the location of epipoles (as suggested by Faugeras, 
1992). 

Finally, we show that, for both schemes, it is possible 
to limit the viewing transformations to the group of rigid 
motions, i.e., it is possible to work with perspective pro- 
jection assuming the cameras are calibrated. The result, 
however, does not include orthographic projection. 

Experiments were conducted with both algorithms, 
and the results show that the 6-point algorithm is sta- 
ble under noise and under conditions that violate the 
assumption that four object points are coplanar. The 8- 
point algorithm, although theoretically superior because 
of lack of the coplanarity assumption, is considerably 
more sensitive to noise. 

2 Why not Classical SFM? 

The work of Koenderink and Van Doom (1991) on affine 
structure from two orthographic views, and the work of 
Ullman and Basri (1989) on re-projection from two or- 
thographic views, have a clear practical aspect: it is 
known that at least three orthographic views are re- 
quired to recover metric structure, i.e., relative depth 



(Ullman 1979, Huang & Lee 1989, Aloimonos & Brown 
1989). Therefore, the suggestion to use affine structure 
instead of metric structure allows a recognition system 
to perform re-projection from two-model views (Ullman 
& Basri), and to generate novel views of the object pro- 
duced by affine transformations in space, rather than by 
rigid transformations (Koenderink & Van Doom). 

This advantage, of working with two rather than three 
views, is not present under perspective projection, how- 
ever. It is known that two perspective views are sufficient 
for recovering metric structure (Roach & Aggarwal 1979, 
Longuett-Higgins 1981, Tsai & Huang 1984, Faugeras & 
Maybank 1990). The question, therefore, is why look for 
alternative representations of structure, and new meth- 
ods for performing re-projection? 

There are three major problems in structure from mo- 
tion methods: (i) critical dependence on an orthographic 
or perspective model of projection, (ii) internal camera 
calibration, and (iii) the problem of stereo-triangulation. 

The first problem is the strict division between meth- 
ods that assume orthographic projection and methods 
that assume perspective projection. These two classes 
of methods do not overlap in their domain of applica- 
tion. The perspective model operates under conditions 
of significant perspective distortions, such as driving on 
a stretch of highway, requires a relatively large field of 
view and relatively large depth variations between scene 
points (Adiv 1989, Dutta & Synder 1990, Tomasi 1991, 
Broida et al. 1990). The orthographic model, on the 
other hand, provides a reasonable approximation when 
the imaging situation is at the other extreme, i.e., small 
field of view and small depth variation between object 
points (a situation for which perspective schemes often 
break down). Typical imaging situations are at neither 
end of these extremes and, therefore, would be vulner- 
able to errors in both models. From the standpoint of 
performing recognition, this problem implies that the 
viewer has control over his field of view — a property 
that may be reasonable to assume at the time of model 
acquisition, but less reasonable to assume occurring at 
recognition time. 

The second problem is related to internal camera cal- 
ibration. The assumption of perspective projection in- 
cludes a distinguishable point, known as the principal 
point, which is at the intersection of the optical axis and 
the image plane. The location of the principal point is 
an internal parameter of the camera, which may deviate 
somewhat from the geometric center of the image plane, 
and therefore, may require calibration. Perspective pro- 
jection also assumes that the image plane is perpendicu- 
lar to the optical axis and the possibility of imperfections 
in the camera requires, therefore, the recovery of the two 
axes describing the image frame, and of the focal length. 
Although the calibration process is somewhat tedious, it 
is sometimes necessary for many of the available com- 
mercial cameras (Brown 1971, Faig 1975, Lenz and Tsai 
1987, Faugeras, Luong and Maybank 1992). The prob- 
lem of calibration is lesser under orthographic projection 
because the projection does not have a distinguishable 
ray; therefore any point can serve as an origin, however 
must still be considered because of the assumption that 



the image plane is perpendicular to the projecting rays. 

The third problem is related to the way shape is 
typically represented under the perspective projection 
model. Because the center of projection is also the ori- 
gin of the coordinate system for describing shape, the 
shape difference (e.g., difference in depth, between two 
object points), is orders of magnitude smaller than the 
distance to the scene, and this makes the computations 
very sensitive to noise. The sensitivity to noise is re- 
duced if images are taken from distant viewpoints (large 
base-line in stereo triangulation), but that makes the 
process of establishing correspondence between points in 
both views more of a problem, and hence, may make the 
situation even worse. This problem does not occur un- 
der the assumption of orthographic projection because 
translation in depth is lost under orthographic projec- 
tion, and therefore, the origin of the coordinate system 
for describing shape (metric and non-metric) is object- 
centered, rather than viewer-centered (Tomasi, 1991). 

These problems, in isolation or put together, make 
much of the reason for the sensitivity of structure from 
motion methods to errors. The recent work of Faugeras 
(1992) and Mohr et al. (1992) addresses the problem of 
internal calibration by assuming central projection in- 
stead of perspective projection. Faugeras and Mohr et 
al. then proceed to reconstruct the projective coordi- 
nates of the scene. Since projective coordinates are mea- 
sured relative to the center of projection, this approach 
does not address the problem of stereo-triangulation or 
the problem of uniformity under both orthographic and 
perspective projection models. 

3 Camera Model and Notations 

We assume that objects in the world are rigid and are 
viewed under central projection. In central projection 
the center of projection is the origin of the camera coor- 
dinate frame and can be located anywhere in projective 
space. In other words, the center of projection can be 
a point in Euclidean space or an ideal point (such as 
happens in parallel projection). The image plane is as- 
sumed to be arbitrarily positioned with respect to the 
camera coordinate frame (unlike perspective projection 
where it is parallel to the xy plane). We refer to this as a 
non-rigid camera configuration. The motion of the cam- 
era, therefore, consists of the translation of the center of 
projection, rotation of the coordinate frame around the 
new location of the center of projection, and followed by 
tilt, pan, and focal length scale of the image plane with 
respect to the new optical axis. This model of projection 
will also be referred to as perspective projection with an 
uncalibrated camera. 

We also include in our derivations the possibility of 
having a rigid camera configuration. A rigid camera is 
simply the familiar model of perspective projection in 
which the center of projection is a point in Euclidean 
space and the image plane is fixed with respect to the 
camera coordinate frame. A rigid camera motion, there- 
fore, consists of translation of the center of projection 
followed by rotation of the coordinate frame and focal 
length scaling. Note that a rigid camera implicitly as- 
sumes internal calibration, i.e., the optical axis pierces 
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Figure 1: Koenderink and Van Doom's Affine Structure. 



through a fixed point in the image and the image plane 
is perpendicular to the optical axis. 

We denote object points in capital letters and image 
points in small letters. If P denotes an object point in 3D 
space, p,p',p" denote its projections onto the first, sec- 
ond and novel projections, respectively. We treat image 
points as rays (homogeneous coordinates) in 3D space, 
and refer to the notation p = [x,y, 1) as the standard 
representation of the image plane. We note that the 
true coordinates of the image plane are related to the 
standard representation by means of a projective trans- 
formation of the plane. In case we deal with central 
projection, all representations of image coordinates are 
allowed, and therefore, without loss of generality we work 
with the standard representation (more on that in Ap- 
pendix A). 

4 Affine Structure: Koenderink and 
Van Doom's Version 

The affine structure invariant described by Koenderink 
and Van Doom (1991) is based on a geometric con- 
struction using a single reference plane, and a reference 
point not coplanar with the reference plane. In affine 
geometry (induced by parallel projection), it is known 
from the fundamental theorem of plane projectivity, that 
three (non-collinear) corresponding points are sufficient 
to uniquely determine all other correspondences (see Ap- 
pendix A for more details on plane projectivity under 
affine and projective geometry). Using three correspond- 
ing points between two views provides us, therefore, with 
a transformation (affine transformation) for determining 
the location of all points of the plane passing through 
the three reference points in the second image plane. 

Let P be an arbitrary point in the scene projecting 
onto p,p' on the two image planes. Let P be the projec- 
tion of P onto the reference plane along the ray towards 
the first image plane, and let p' be the projection of P 
onto the second image plane (p' and p' coincide if P is 



on the reference plane). Note that the location of p' is 
known via the affine transformation determined by the 
projections of the three reference points. Finally, let 
Q be the fourth reference point (not on the reference 
plane). Using a simple geometric drawing, the affine 
structure invariant is derived as follows. 

Consider Figure 1. The projections of the reference 
point Q and an arbitrary point of interest P form two 
similar trapezoids: PPp'p' and QQq'q'. From similarity 
of trapezoids we have, 
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By assuming that q, q' is a given corresponding point, we 
obtain a shape invariant that is invariant under parallel 
projection (the object points are fixed while the camera 
changes the location and position of the image plane 
towards the projecting rays). 

Before we extend this result to central projection by 
using projective geometry, we first describe a different 
affine invariant using two reference planes, rather than 
one reference plane and a reference point. The new affine 
invariant is the one that will be applied later to central 
projection. 

5 Affine Structure Using Two 
Reference Planes 

We make use of the same information — the projections 
of four non-coplanar points — to set up two reference 
planes. Let Pj, j = 1,...,4, be the four non-coplanar 
reference points in space, and let pj < > p'j be their ob- 
served projections in both views. The points Pi,P'2,P3 
and P'2, -P3, -P4 lie on two different planes, therefore, we 
can account for the motion of all points coplanar with 
each of these two planes. Let P be a point of interest, 
not coplanar with either of the reference planes, and let 
P and P be its projections onto the two reference planes 
along the ray towards the first view. 

Consider Figure 2. The projection of P, P and P onto 
p',p' and p' respectively, gives rise to two similar trape- 
zoids from which we derive the following relation: 
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The ratio a p is invariant under parallel projection. There 
is no particular advantage for preferring a p over j p as 
a measure of affine structure, but as will be described 
below, this new construction forms the basis for extend- 
ing affine structure to projective structure, whereas the 
single reference plane construction does not (see Ap- 
pendix D for proof). 

In the projective plane, we need four coplanar points 
to determine the motion of a reference plane. We show 
that, given the epipoles, only three corresponding points 
for each reference plane are sufficient for recovering the 
associated projective transformations induced by those 
planes. Altogether, the construction provides us with 
four points along each epipolar line. The similarity of 
trapezoids in the affine case turns, therefore, into a cross- 
ratio in the projective case. 
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Figure 3: Definition of projective shape as the cross ratio 
oip',p',p',V } . 



Figure 2: Affine structure using two reference planes. 



This leads to the result (Theorem 1) that, in addition 
to the epipoles, only four corresponding points, projected 
from four non-coplanar points in the scene, are sufficient 
for recovering the projective structure invariant for all 
other points. The epipoles can be recovered by either 
extending the Koenderink and Van Doom (1991) con- 
struction to projective space using six points (four of 
which are assumed to be coplanar), or by using other 
methods, notably those based on the Longuet-Higgins 
fundamental matrix. This leads to projective structure 
from eight points in general position. 

6 Projective Structure 

We assume for now that the location of both epipoles is 
known, and we will address the problem of finding the 
epipoles later. The epipoles, also known as the foci of ex- 
pansion, are the intersections of the line in space connect- 
ing the two centers of projection and the image planes. 
There are two epipoles, one on each image plane — the 
epipole on the second image we call the left epipole, and 
the epipole on the first image we call the right epipole. 
The image lines emanating from the epipoles are known 
as the epipolar lines. 

Consider Figure 3 which illustrates the two reference 
plane construction, defined earlier for parallel projection, 
now displayed in the case of central projection. The 
left epipole is denoted by Vi, and because it is on the 
line ViV'z (connecting the two centers of projection), the 
line PV\ projects onto the epipolar line p'V\. Therefore, 
the points P and P project onto the points p' and p' , 
which are both on the epipolar line p'V\. The points 
p',p',p' and V; are collinear and projectively related to 
P, P, P, V\, and therefore have the same cross-ratio: 

\P-P\ \Vi-p\ \ P '-p'\ \V,-p'\ 



\p-p\ IK-pi \p'-p'\ \Vi-p'[ 

Note that when the epipole V; becomes an ideal point 
(vanishes along the epipolar line), then a p is the same 



as the affine invariant defined in section 5 for parallel 
projection. 

The cross-ratio a p is a direct extension of the affine 
structure invariant defined in section 5 and is referred 
to as projective structure. We can use this invariant to 
reconstruct any novel view of the object (taken by a 
non-rigid camera) without ever recovering depth or even 
projective coordinates of the object. 

Having defined the projective shape invariant, and as- 
suming we still are given the locations of the epipoles, 
we show next how to recover the projections of the two 
reference planes onto the second image plane, i.e., we 
describe the computations leading to p' and p' . 

Since we are working under central projection, we 
need to identify four coplanar points on each reference 
plane. In other words, in the projective geometry of the 
plane, four corresponding points, no three of which are 
collinear, are sufficient to determine uniquely all other 
correspondences (see Appendix A, for more details). We 
must, therefore, identify four corresponding points that 
are projected from four coplanar points in space, and 
then recover the projective transformation that accounts 
for all other correspondences induced from that plane. 
The following proposition states that the corresponding 
epipoles can be used as a fourth corresponding point for 
any three corresponding points selected from the pair of 
images. 

Proposition 1 A projective transformation, A, that is 
determined from three arbitrary, non- collinear, corre- 
sponding points and the corresponding epipoles, is a pro- 
jective transformation of the plane passing through the 
three object points which project onto the correspond- 
ing image points. The transformation A is an induced 
epipolar transformation, i.e., the ray Ap intersects the 
epipolar line p'Vi for any arbitrary image point p and its 
corresponding point p' . 

Comment: An epipolar transformation F is a mapping 
between corresponding epipolar lines and is determined 
(not uniquely) from three corresponding epipolar lines 
and the epipoles. The induced point transformation is 
E = (F~ 1 ) t (induced from the point/line duality of pro- 



jective geometry, see Appendix C for more details on 
epipolar transformations). 

Proof: Let pj < > p'-, j = 1, 2, 3, be three arbitrary 

corresponding points, and let V; and V r denote the left 
and right epipoles. First note that the four points pj and 
V r are projected from four coplanar points in the scene. 
The reason is that the plane defined by the three object 
points Pj intersects the line V1V2 connecting the two 
centers of projection, at a point — regular or ideal. That 
point projects onto both epipoles. The transformation 
A, therefore, is a projective transformation of the plane 
passing through the three object points P\, P2, P3. Note 
that A is uniquely determined provided that no three of 
the four points are collinear. 

Let up' = Ap for some arbitrary point p. Because lines 
are projective invariants, any point along the epipolar 
line pV r must project onto the epipolar line p'V\. Hence, 
A is an induced epipolar transformation. [1 

Given the epipoles, therefore, we need just three points 
to determine the correspondences of all other points 
coplanar with the reference plane passing through the 
three corresponding object points. The transformation 
(collineation) A is determined from the following equa- 
tions: 

Apj = pjp'j, j = 1,2,3 

AV r =pV u 
where p,pj are unknown scalars, and A33 = 1. One 
can eliminate p, pj from the equations and solve for the 
matrix A from the three corresponding points and the 
corresponding epipoles. That leads to a linear system 
of eight equations, and is described in more detail in 
Appendix A. 

If P\, P'z, P3 define the first reference plane, the trans- 
formation A determines the location of p' for all other 
points p (p' and p' coincide if P is coplanar with the first 
reference plane). In other words, we have that p' = Ap. 
Note that p' is not necessarily a point on the second im- 
age plane, but it is on the line V2-P. We can determine 
its location on the second plane by normalizing Ap such 
that its third component is set to 1. 

Similarly, let P2,P3,Pa define the second reference 
plane (assuming the four object points Pj, j = 1, ...,4, 
are non-coplanar). The transformation E is uniquely 
determined by the equations 

Epj = Pjp'j, j = 2,3,4 

EV r =pV u 

and determines all other correspondences induced by the 
second reference plane (we assume that no three of the 
four points used to determine E are collinear). In other 
words, Ep determines the location of p' up to a scale 
factor along the ray V2-P. 

Instead of normalizing Ap and Ep we compute a p 
from the cross-ratio of the points represented in homo- 
geneous coordinates, i.e., the cross-ratio of the four rays 
V'2,p' , V2P' , V2P' , V'jYh as follows: Let the rays p' , V\ be 
represented as a linear combination of the rays p' = Ap 
and p' = Ep, i.e., 

p' = p' + kp' 

Vi=p' + k'p', 



then 



-p (see Appendix B for more details). This 



way of computing the cross-ratio is preferred over the 
more familiar cross-ratio of four collinear points, because 
it enables us to work with all elements of the projective 
plane, including ideal points (a situation that arises, for 
instance, when epipolar lines are parallel, and in general 
under parallel projection). 

We have therefore shown the following result: 

Theorem 1 In the case where the location of epipoles 
are known, then four corresponding points, coming from 
four non-coplanar points in space, are sufficient for com- 
puting the projective structure invariant a p for all other 
points in space projecting onto corresponding points in 
both views, for all central projections, including parallel 
projection. 

This result shows that the difference between parallel 
and central projection lies entirely on the epipoles. In 
both cases four non-coplanar points are sufficient for ob- 
taining the invariant, but in the parallel projection case 
we have prior knowledge that both epipoles are ideal, 
therefore they are not required for determining the trans- 
formations A and E (in other words, A and E are affine 
transformations, more on that in Section 7.2). 

Another point to note with this result is that the 
minimal number of corresponding points needed for re- 
projection is smaller than the previously reported num- 
ber (Faugeras 1992, Mohr et al. 1992) for recovering 
the projective coordinates of object points. Faugeras 
shows that five corresponding points coming from five 
points in general position (i.e., no four of them are copla- 
nar) can be used, together with the epipoles, to recover 
the projective coordinates of all other points in space. 
Because the projective structure invariant requires only 
four points, this implies that re-projection is done more 
directly than through full reconstruction of projective 
coordinates, and therefore is likely to be more stable. 

We next discuss algorithms for recovering the loca- 
tion of epipoles. The problem of recovering the epipoles 
is well known and several approaches have been sug- 
gested in the past (Longuet-Higgins and Prazdny 1980, 
Rieger-Lawton 1985, Faugeras and Maybank 1990, Hil- 
dreth 1991, Faugeras 1992, Faugeras, Luong and May- 
bank 1992). We start with a method that requires six 
corresponding points (two additional points to the four 
we already have). The method is a direct extension of the 
Koenderink and Van Doom (1991) construction in par- 
allel projection, and was described earlier by Lee (1988) 
for the purpose of recovering the translational compo- 
nent of camera motion. 

The second algorithm for locating the epipoles is 
adopted from Faugeras (1992) and is based on the fun- 
damental matrix of Longuet-Higgins (1981). 

7 Epipoles from Six Points 

We can recover the correspondences induced from the 
first reference plane by selecting four corresponding 
points, assuming they are projected from four coplanar 
object points. Let pj = (xj,yj,l) and p'- = («'•, j/'-, 1) 
and j = 1, ..., 4 represent the standard image coordinates 
of the four corresponding points, no three of which are 
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Figure 4: The geometry of locating the left epipole using 
two points out of the reference plane. 



collinear, in both projections. Therefore, the transfor- 
mation A is uniquely determined by the following equa- 
tions, 

PjPj = A Pj . 
Let p' = Ap be the homogeneous coordinate representa- 
tion of the ray VjP, and let p~ l = A~ l p' . 

Having accounted for the motion of the reference 
plane, we can easily find the location of the epipoles (in 
standard coordinates). Given two object points P5, Pe 
that are not on the reference plane, we can find both 
epipoles by observing that p' is on the left epipolar 
line, and similarly that p~ l is on the right epipolar line. 
Stated formally, we have the following proposition: 

Proposition 2 The left epipole, denoted by V, is at the 

intersection of the line p' 5 p' 5 and the linep' 6 p' 6 . Similarly, 
the right epipole, denoted by V r , is at the intersection of 
P5P5 1 andpepg 1 . 

Proof: It is sufficient to prove the claim for one of the 
epipoles, say the left epipole. Consider Figure 4 which 
describes the construction geometrically. By construc- 
tion, the line P5P5V1 projects to the line p' 5 p' 5 via V2 
(points and lines are projective invariants) and therefore 
they are coplanar. In particular, V\ projects to V; which 
is located at the intersection of p' 5 p' 5 and V1V2. Simi- 
larly, the line p' 6 p' 6 intersects V\ V2 at V; . Finally, V; and 

V must coincide because the two lines p' 5 p' 5 and p' 6 p' 6 are 
coplanar (both are on the image plane). [1 

Algebraically, we can recover the ray V1V2, or V; up to 
a scale factor, using the following formula: 

Vi = (p' 5 xp' 5 )x (p' 6 xp' 6 ). 

Note that V; is defined with respect to the standard coor- 
dinate frame of the second camera. We treat the epipole 

V as the ray V1V2 with respect to V2, and the epipole 
V r as the same ray but with respect to V\ . Note also 
that the third component of V; is zero if epipolar lines 
are parallel, i.e., V; is an ideal point in projective terms 
(happening under parallel projection, or when the non- 
rigid camera motion brings the image plane to a position 
where it is parallel to the line ViVz). 



In the case where more than two epipolar lines are 
available (such as when more than six corresponding 
points are available), one can find a least-squares so- 
lution for the epipole by using a principle component 
analysis, as follows. Let B be a k x 3 matrix, where 
each row represents an epipolar line. The least squares 
solution to V; is the unit eigenvector associated with the 
smallest eigenumber of the 3x3 matrix B t B. Note that 
this can be done analytically because the characteristic 
equation is a cubic polynomial. 

Altogether, we have a six point algorithm for recover- 
ing both the epipoles, and the projective structure a p , 
and for performing re-projection onto any novel view. 
We summarize in the following section the 6-point algo- 
rithm. 

7.1 Re-projection Using Projective Structure: 
6-point Algorithm 

We assume we are given two model views of a 3D object, 
and that all points of interest are in correspondence. We 
assume these correspondences can be based on measures 
of correlation, as used in optical-flow methods (see also 
Shashua 1991, Bachelder & Ullman 1992 for methods for 
extracting correspondences using combination of optical 
flow and afflne geometry). 

Given a novel view we extract six corresponding points 

(with one of the model views): pj < > p'- < > p'j , 

j = 1, ..., 6. We assume the first four points are projected 
from four coplanar points, and the other corresponding 
points are projected from points that are not on the ref- 
erence plane. Without loss of generality, we assume the 
standard coordinate representation of the image planes, 
i.e., the image coordinates are embedded in a 3D vec- 
tor whose third component is set to 1 (see Appendix A). 
The computations for recovering projective shape and 
performing re-projection are described below. 

1: Recover the transformation A that satisfies pjp'- = 
Apj, j = 1, ...,4. This requires setting up a linear 
system of eight equations (see Appendix A). Apply 
the transformation to all points p, denoting p' = Ap. 
Also recover the epipoles V; = (p' 5 x p'5) x (p' 6 x p' 6 ) 
and V r = (p 5 x A~ 1 p' 5 ) x (p 6 x A~ 1 p' 6 ). 

2: Recover the transformation E that satisfies pV\ = 
EV T and pjp'j = Epj, j = 4, 5, 6. 

3: Compute the cross-ratio of the points p' , Ap, Ep, V, 
for all points p and denote that by a p (see Ap- 
pendix B for details on computing the cross-ratio 
of four rays). 

4: Perform step 1 between the first and novel view: 
recover A that satisfies pjp 1 ' = Apj, j = 1,...,4, 

apply A to all points p and denote that by p" = Ap, 
recover the epipoles V\ n = (p' 5 ' x p' 5 ' ) x (p'l x p'l) and 

V rn = (P5 X A- V 5 ') X (p 6 X I" V"). 

5: Perform step 2 between the first and novel view: 
Recover the transformation E that satisfies pV\ n = 
EVm and pjp'J = Ep jt j = 4, 5, 6. 

6: For every point p, recover p" from the cross-ratio a p 
and the three rays Ap, Ep, V\ n . Normalize p" such 



that its third coordinate is set to 1. 

The entire procedure requires setting up a linear sys- 
tem of eight equations four times (Step 1,2,4,5) and com- 
puting cross-ratios (linear operations as well). 

We discuss below an important property of this pro- 
cedure which is the transparency with respect to projec- 
tion model: central and parallel projection are treated 
alike — a property which has implications on stability 
of re-projection no matter what degree of perspective 
distortions are present in the images. 

7.2 The Case of Parallel Projection 

The construction for obtaining projective structure is 
well defined for all central projections, including the case 
where the center of projection is an ideal point, i.e., such 
as happening with parallel projection. The construction 
has two components: the first component has to do with 
recovering the epipolar geometry via reference planes, 
and the second component is the projective invariant a p . 

From Proposition 1 the projective transformations A 
and E can be uniquely determined from three corre- 
sponding points and the corresponding epipoles. If both 
epipoles are ideal, the transformations become affine 
transformations of the plane (an affine transformation 
separates ideal points from Euclidean points). All other 
possibilities (both epipoles are Euclidean, one epipole 
Euclidean and the other epipole ideal) lead to projective 
transformations. Because a projectivity of the projec- 
tive plane is uniquely determined from any four points 
on the projective plane (provided no three are collinear), 
the transformations A and E are uniquely determined 
under all situations of central projection — including 
parallel projection. 

The projective invariant a p is the same as the one 
defined under parallel projection (Section 5) — affine 
structure is a particular instance of projective structure 
in which the epipole V; is an ideal point. By using the 
same invariant for both parallel and central projection, 
and because all other elements of the geometric construc- 
tion hold for both projection models, the overall system 
is transparent to the projection model being used. 

The first implication of this property has to do with 
stability. Projective structure does not require any per- 
spective distortions, therefore all imaging situations can 
be handled — wide or narrow field of views. The second 
implication is that 3D visual recognition from 2D images 
can be achieved in a uniform manner with regard to the 
projection model. For instance, we can recognize (via re- 
projection) a perspective image of an object from only 
two orthographic model images, and in general any com- 
bination of perspective and orthographic images serving 
as model or novel views is allowed. 

The results so far required prior knowledge (or as- 
sumption) that four of the corresponding points are com- 
ing from coplanar points in space. This requirement can 
be avoided, using two more corresponding points (mak- 
ing eight points overall), and is described in the next 
section. 



8 Epipoles from Eight Points 

We adopt a recent algorithm suggested by Faugeras 
(1992) which is based on Longuet-Higgins' (1981) funda- 
mental matrix. The method is very simple and requires 
eight corresponding points for recovering the epipoles. 

Let F be an epipolar transformation, i.e., Fl = ill', 
where / = V r x p and /' = V; x p' are corresponding 
epipolar lines. We can rewrite the projective relation of 
epipolar lines using the matrix form of cross-products: 

F(V r x p) = F[V r ]p = pi' , 

where [V r ] is a skew symmetric matrix (and hence has 
rank 2). From the point/line incidence property we have 
that p' ■ I' = and therefore, p' F[V r ]p = 0, or p' Hp = 
where H = -F[V r ]. The matrix H is known as the fun- 
damental matrix introduced by Longuet-Higgins (1981), 
and is of rank 2. One can recover H (up to a scale factor) 
directly from eight corresponding points, or by using a 
principle components approach if more than eight points 
are available. Finally, it is easy to see that 

HV r = 0, 

and therefore the epipole V r can be uniquely recovered 
(up to a scale factor). Note that the determinant of 
the first principle minor of H vanishes in the case where 
V r is an ideal point, i.e., /iii^22 — ^12^21 = 0. In that 
case, the x, y components of V r can be recovered (up to 
a scale factor) from the third row of H . The epipoles, 
therefore, can be uniquely recovered under both central 
and parallel projection. We have arrived at the following 
theorem: 

Theorem 2 In the case where we have eight correspond- 
ing points of two views taken under central projection 
(including parallel projection), four of these points, com- 
ing from four non-coplanar points in space, are suffi- 
cient for computing the projective structure invariant a p 
for the remaining four points and for all other points in 
space projecting onto corresponding points in both views. 

We summarize in the following section the 8-point 
scheme for reconstructing projective structure and per- 
forming re-projection onto a novel view. 

8.1 8-point Re-projection Algorithm 

We assume we have eight corresponding points between 

two model views and the novel view, pj < > p'- < > p"- , 

j = 1, ..., 8, and that the first four points are coming from 
four non-coplanar points in space. The computations 
for recovering projective structure and performing re- 
projection are described below. 

1: Recover the fundamental matrix H (up to a scale 

'* rr ' 1,...,8. The right 
0. Similarly, the 
left epipole is recovered from the relation p 1 Hp' and 
HVi = 0. 

2: Recover the transformation A that satisfies pV\ = 
AV r and pjp'j = Apj, j = 1,2,3. Similarly, recover 
the transformation E that satisfies pV\ = EV r and 
PjP'j = Ep h j = 2,3,4. 



factor) that satisfies p'- Hpj , j 
epipole V r then satisfies HV r 



3: Compute a p as the cross-ratio of p' , Ap, Ep, V\, for 
all points p. 

4: Perform step 1 and 2 between the first and novel 



view: recover the epipoles V rn , V\ n 
formations A and E. 



and the trans- 



5: For every point p, recover p" from the cross-ratio a p 
and the three rays Ap, Ep, V\ n . Normalize p" such 
that its third coordinate is set to 1. 

We discuss next the possibility of working with a rigid 
camera (i.e., perspective projection and calibrated cam- 
era). 

9 The Rigid Camera Case 

The advantage of the non-rigid camera model (or the 
central projection model) used so far is that images can 
be obtained from uncalibrated cameras. The price paid 
for this property is that the images that produce the 
same projective structure invariant (equivalence class of 
images of the object) can be produced by applying non- 
rigid transformations of the object, in addition to rigid 
transformations. 

In this section we show that it is possible to verify 
whether the images were produced by rigid transfor- 
mations, which is equivalent to working with perspec- 
tive projection assuming the cameras are internally cal- 
ibrated. This can be done for both schemes presented 
above, i.e., the 6-point and 8-point algorithms. In both 
cases we exclude orthographic projection and assume 
only perspective projection. 

In the perspective case, the second reference plane is 
the image plane of the first model view, and the trans- 
formation for projecting the second reference plane onto 
any other view is the rotational component of camera 
motion (rigid transformation). We recover the rota- 
tional component of camera motion by adopting a re- 
sult derived by Lee (1988), who shows that the rota- 
tional component of motion can be uniquely determined 
from two corresponding points and the corresponding 
epipoles. We then show that projective structure can be 
uniquely determined, up to a uniform scale factor, from 
two calibrated perspective images. 

Proposition3 (Lee, 1988) In the case of perspective 
projection, the rotational component of camera motion 
can be uniquely recovered, up to a reflection, from two 
corresponding points and the corresponding epipoles. 
The reflection component can also be uniquely deter- 
mined by using a third corresponding point. 

Proof: Let /'• = p'- x V; and lj = pj x V r , j = 1, 2 
be two corresponding epipolar lines. Because R is an or- 
thogonal matrix, it leaves vector magnitudes unchanged, 
and we can normalize the length of l[, 1' 2 , V to be of the 
same length of li,h,V r , respectively. We have therefore, 
/'. = Rlj , j = 1,2, and V; = RV r , which is sufficient for 
determining R up to a reflection. Note that because R 
is a rigid transformation, it is both an epipolar and an 
induced epipolar transformation (the induced transfor- 
mation E is determined by E = (R~ 1 ) t , therefore E = R 
because R is an orthogonal matrix). 




Figure 5: Illustration that projective shape can be re- 
covered only up to a uniform scale (see text). 

To determine the reflection component, it is sufficient 

to observe a third corresponding point p 3 < > p' 3 . The 

object point P3 is along the ray V1P3 and therefore has 
the coordinates a 3 p 3 (w.r.t. the first camera coordinate 
frame), and is also along the ray V2P3 and therefore has 
the coordinates a' 3 p 3 (w.r.t. the second camera coordi- 
nate frame). We note that the ratio between a 3 and 
a' 3 is a positive number. The change of coordinates is 
represented by: 

j3V r +a 3 Rp 3 = a' 3 p 3 , 

where /? is an unknown constant. If we multiply both 
sides of the equation by /'•, j = 1,2,3, the term f3V r 
drops out, because V r is incident to all left epipolar lines, 
and after substituting V- with /'• R, we are left with, 

azl) -p 3 = a' 3 l'j -p 3 , 

which is sufficient for determining the sign of /'• . U 

The rotation matrix R can be uniquely recovered from 
any three corresponding points and the corresponding 
epipoles. Projective structure can be reconstructed by 
replacing the transformation E of the second reference 
plane, with the rigid transformation R (which is equiv- 
alent to treating the first image plane as a reference 
plane). We show next that this can lead to projective 
structure up to an unknown uniform scale factor (unlike 
the non-rigid camera case). 

Proposition 4 In the perspective case, the projective 
shape constant a p can be determined, from two views, 
at most up to a uniform scale factor. 

Proof: Consider Figure 5, and let the effective trans- 
lation be V2 — V s = k(V'2 — V\), which is the true trans- 
lation scaled by an unknown factor k. Projective shape, 
a p , remains fixed if the scene and the focal length of the 
first view are scaled by k: from similarity of triangles we 
have, 

, _ Vs ~ V'2 _ Ps ~ Vs _ fs_ 

~ Vi - V 2 ~ p - Vi ~ 1 

P s -V P,- V 2 
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image plane 
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Figure 6: The basic object configuration for the experi- 
mental set-up. 



where f s is the scaled focal length of the first view. Since 
the magnitude of the translation along the line V\ V2 is 
irrecoverable, we can assume it is null, and compute a p 
as the cross-ratio of p' , Ap, Rp, V\ which determines pro- 
jective structure up to a uniform scale. R 

Because a p is determined up to a uniform scale, we 
need an additional point in order to establish a common 
scale during the process of re-projection (we can use one 
of the existing six or eight points we already have). We 
obtain, therefore, the following result: 

Theorem 3 In the perspective case, a rigid re- 
projection from two model views onto a novel view is pos- 
sible, using four corresponding points coming from four 
non-coplanar points, and the corresponding epipoles. 
The projective structure computed from two perspective 
images, is invariant up to an overall scale factor. 

Orthographic projection is excluded from this result 
because it is well known that the rotational component 
cannot be uniquely determined from two orthographic 
views (Ullman 1979, Huang and Lee 1989, Aloimonos 
and Brown 1989). To see what happens in the case of 
parallel projection note that the epipoles are vectors on 
the xy plane of their coordinate systems (ideal points), 
and the epipolar lines are two vectors perpendicular to 
the epipole vectors. The equation RV r = V; takes care 
of the rotation in plane (around the optical axis). The 
other two equations Rlj = /'• , j = 1,2, take care only 
of rotation around the epipolar direction — rotation 
around an axis perpendicular to the epipolar direction 
is not accounted for. The equations for solving for R 
provide a non-singular system of equations but do pro- 
duce a rotation matrix with no rotational components 
around an axis perpendicular to the epipolar direction. 

10 Simulation Results Using Synthetic 
Objects 

We ran simulations using synthetic objects to illustrate 
the re-projection process using the 6-point scheme under 
various imaging situations. We also tested the robust- 
ness of the re-projection method under various types of 
noise. Because the 6-point scheme requires that four of 



the corresponding points be projected from four copla- 
nar points in space, it is of special interest to see how 
the method behaves under conditions that violate this 
assumption, and under noise conditions in general. The 
stability of the 8-point algorithm largely depends on the 
method for recovering the epipoles. The method adopted 
from Faugeras (1992), described in Section 8, based on 
the fundamental matrix, tends to be very sensitive to 
noise if the minimal number of points (eight points) are 
used. We have, therefore, focused the experimental error 
analysis on the 6-point scheme. 

Figure 6 illustrates the experimental set-up. The ob- 
ject consists of 26 points in space arranged in the follow- 
ing manner: 14 points are on a plane (reference plane) 
ortho-parallel to the image plane, and 12 points are out 
of the reference plane. The reference plane is located 
two focal lengths away from the center of projection (fo- 
cal length is set to 50 units). The depth of out-of-plane 
points varies randomly between 10 to 25 units away from 
the reference plane. The x,y coordinates of all points, 
except the points P\,...,Pq, vary randomly between 
— 240. The 'privileged' points P\,...,Pq have x,y co- 
ordinates that place these points all around the object 
(clustering privileged points together will inevitably con- 
tribute to instability). 

The first view is simply a perspective projection of the 
object. The second view is a result of rotating the object 
around the point (128, 128, 100) with an axis of rotation 
described by the unit vector (0.14,0.7,0.7) by an an- 
gle of 29 degrees, followed by a perspective projection 
(note that rotation about a point in space is equivalent 
to rotation about the center of projection followed by 
translation). The third (novel) view is constructed in a 
similar manner with a rotation around the unit vector 
(0.7, 0.7, 0.14) by an angle of 17 degrees. Figure 7 (first 
row) displays the three views. Also in Figure 7 (second 
row) we show the result of applying the transformation 
due to the four coplanar points p\, ...,P4 (Step 1 , see Sec- 
tion 7.1) to all points in the first view. We see that all 
the coplanar points are aligned with their correspond- 
ing points in the second view, and all other points are 
situated along epipolar lines. The display on the right 
in the second row shows the final re-projection result (8- 
point and 6-point methods produce the same result). All 
points re-projected from the two model views are accu- 
rately (noise-free experiment) aligned with their corre- 
sponding points in the novel view. 

The third row of Figure 7 illustrates a more challeng- 
ing imaging situation (still noise-free). The second view 
is orthographically projected (and scaled by 0.5) follow- 
ing the same rotation and translation as before, and the 
novel view is a result of a central projection onto a tilted 
image plane (rotated by 12 degrees around a coplanar 
axis parallel to the «-axis). We have therefore the situ- 
ation of recognizing a non-rigid perspective projection 
from a novel viewing position, given a rigid perspec- 
tive projection and a rigid orthographic projection from 
two model viewing positions. The 6-point re-projection 
scheme was applied with the result that all re-projected 
points are in accurate alignment with their correspond- 
ing points in the novel view. Identical results were ob- 





Figure 7: Illustration of Re-projection. Row 1 (left to right): Three views of the object, two model views and a 
novel view, constructed by rigid motion following perspective projection. The filled dots represent p\, ...,P4 (coplanar 
points). Row 2: Overlay of the second view and the first view following the transformation due to the reference 
plane (Step 1, Section 7.1). All coplanar points are aligned with their corresponding points, the remaining points are 
situated along epipolar lines. The righthand display is the result of re-projection — the re-projected image perfectly 
matches the novel image (noise-free situation). Row 3: The lefthand display shows the second view which is now 
orthographic. The middle display shows the third view which is now a perspective projection onto a tilted image 
plane. The righthand display is the result of re-projection which perfectly matches the novel view. 
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served with the 8-point algorithms. 

The remaining experiments, discussed in the follow- 
ing sections, were done under various noise conditions. 
We conducted three types of experiments. The first ex- 
periment tested the stability under the situation where 
Pi,...,Pi are non-coplanar object points. The second 
experiment tested stability under random noise added 
to all image points in all views, and the third experi- 
ment tested stability under the situation that less noise 
is added to the privileged six points, than to other points. 

10.1 Testing Deviation from Coplanarity 

In this experiment we investigated the effect of translat- 
ing P\ along the optical axis (of the first camera position) 
from its initial position on the reference plane (z = 100) 
to the farthest depth position (z = 125), in increments 
of one unit at a time. The experiment was conducted us- 
ing several objects of the type described above (the six 
privileged points were fixed, the remaining points were 
assigned random positions in space in different trials), 
undergoing the same motion described above (as in Fig- 
ure 7, first row). The effect of depth translation to the 
level z = 125 on the location of p\ is a shift of 0.93 pix- 
els, on p[ is 1.58 pixels, and on the location of p'{ is 3.26 
pixels. Depth translation is therefore equivalent to per- 
turbing the location of the projections of Pi by various 
degrees (depending on the 3D motion parameters). 

Figure 8 shows the average pixel error in re-projection 
over the entire range of depth translation. The average 
pixel error was measured as the average of deviations 
from the re-projected point to the actual location of the 
corresponding point in the novel view, taken over all 
points. Figure 8 also displays the result of re-projection 
for the case where Pi is at z = 125. The average error 
is 1.31, and the maximal error (the point with the most 
deviation) is 7.1 pixels. The alignment between the re- 
projected image and the novel image is, for the most 
part, fairly accurate. 

10.2 Situation of Random Noise to all Image 
Locations 

We next add random noise to all image points in all 
three views (Pi is set back to the reference plane). This 
experiment was done repeatedly over various degrees of 
noise and over several objects. The results shown here 
have noise between 0-1 pixels randomly added to the x 
and y coordinates separately. The maximal perturbation 
is therefore \/2, and because the direction of perturba- 
tion is random, the maximal error in relative location is 
double, i.e., 2.8 pixels. Figure 9 shows the average pixel 
errors over 10 trials (one particular object, the same mo- 
tion as before). The average error fluctuates around 1.6 
pixels. Also shown is the result of re-projection on a typ- 
ical trial with average error of 1.05 pixels, and maximal 
error of 5.41 pixels. The match between the re-projected 
image and the novel image is relatively good considering 
the amount of noise added. 

10.3 Random Noise Case 2 

A more realistic situation occurs when the magnitude of 
noise associated with the privileged six points is much 



11 



lower than the noise associated with other points, for 
the reason that we are interested in tracking points of 
interest that are often associated with distinct inten- 
sity structure (such as the tip of the eye in a picture 
of a face). Correlation methods, for instance, are known 
to perform much better on such locations, than on ar- 
eas having smooth intensity change, or areas where the 
change in intensity is one-dimensional. We therefore ap- 
plied a level of 0-0.3 perturbation to the x and y coor- 
dinates of the six points, and a level of 0-1 to all other 
points (as before). The results are shown in Figure 10. 
The average pixel error over 10 trials fluctuates around 
0.5 pixels, and the re-projection shown for a typical trial 
(average error 0.52, maximal error 1.61) is in relatively 
good correspondence with the novel view. With larger 
perturbations at a range of 0-2, the algorithm behaves 
proportionally well, i.e., the average error over 10 trials 
is 1.37. 

11 Summary 

In this paper we focused on the problem of recovering 
relative, non-metric, structure from two views of a 3D 
object. Specifically, the invariant structure we recover 
does not require internal camera calibration, does not 
involve full reconstruction of shape (Euclidean or pro- 
jective coordinates), and treats parallel and central pro- 
jection as an integral part of one unified system. We 
have also shown that the invariant can be used for the 
purposes of visual recognition, within the framework of 
the alignment approach to recognition. 

The study is based on an extension of Koenderink and 
Van Doom's representation of affine structure as an in- 
variant defined with respect to a reference plane and 
a reference point. We first showed that the KV affine 
invariant cannot be extended directly to a projective in- 
variant (Appendix D), but there exists another affine in- 
variant, described with respect to two reference planes, 
that can easily be extended to projective space. As a 
result we obtained the projective structure invariant. 

We have shown that the difference between the affine 
and projective case lie entirely in the location of epipoles, 
i.e., given the location of epipoles both the affine and 
projective structure are constructed from the same infor- 
mation captured by four corresponding points projected 
from four non-coplanar points in space. Therefore, the 
additional corresponding points in the projective case 
are used solely for recovering the location of epipoles. 

We have shown that the location of epipoles can be 
recovered under both parallel and central projection us- 
ing six corresponding points, with the assumption that 
four of those points are projected from four coplanar 
points in space, or alternatively by having eight cor- 
responding points without assumptions on coplanarity. 
The overall method for reconstructing projective struc- 
ture and achieving re-projection was referred to as the 6- 
point and the 8-point algorithms. These algorithms have 
the unique property that projective structure can be re- 
covered from both orthographic and perspective images 
from uncalibrated cameras. This property implies, for 
instance, that we can perform recognition of a perspec- 
tive image of an object given two orthographic images as 
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Figure 8: Deviation from coplanarity: average pixel error due to translation of Pi along the optical axis from z = 100 
to z = 125, by increments of one unit. The result of re-projection (overlay of re-projected image and novel image) 
for the case z = 125. The average error is 1.31 and the maximal error is 7.1. 
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Random Error Trials 



Figure 9: Random noise added to all image points, over all views, for 10 trials. Average pixel error fluctuates around 
1.6 pixels. The result of re-projection on a typical trial with average error of 1.05 pixels, and maximal error of 5.41 
pixels. 



a model. It also implies greater stability because the size 
of the field of view is no longer an issue in the process of 
reconstructing shape or performing re-projection. 
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A Fundamental Theorem of Plane 
Projectivity 

The fundamental theorem of plane projectivity states 
that a projective transformation of the plane is com- 
pletely determined by four corresponding points. We 
prove the theorem by first using a geometric drawing, 
and then algebraically by introducing the concept of rays 
(homogeneous coordinates). The appendix ends with the 
system of linear equations for determining the correspon- 
dence of all points in the plane, given four corresponding 
points (used repeatedly throughout this paper). 

Definitions: A perspectivity between two planes is 
defined as a central projection from one plane onto the 
other. A projectivity is defined as made out of a finite 
sequence of perspectivities. A projectivity, when repre- 
sented in an algebraic form, is called a projective trans- 
formation. The fundamental theorem states that a pro- 
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jectivity is completely determined by four corresponding 
points. 

Geometric Illustration 

Consider the geometric drawing in Figure 11. Let 
A, B, C, U be four coplanar points in the scene, and let 
A' , B' , C" ,U' be their projection in the first view, and 
A" , B" , C" ,U" be their projection in the second view. 
By construction, the two views are projectively related 
to each other. We further assume that no three of the 
points are collinear (four points form a quadrangle), and 
without loss of generality let U be located within the 
triangle ABC. Let BC be the «-axis and BA be the 
j/-axis. The projection of U onto the «-axis, denoted by 
U x , is the intersection of the line AU with the «-axis. 
Similarly U y is the intersection of the line CU with the 
2/- axis, because straight lines project onto straight lines, 
we have that U x , U y correspond to U x , U' if and only if U 
corresponds to U' . For any other point P , coplanar with 
ABCU in space, its coordinates P x ,P y are constructed 
in a similar manner. We therefore have that B,U X , P X ,C 
are collinear and therefore the cross ratio must be equal 
to the cross ratio of B' , U' X ,P' X , C" , i.e. 

BC-U X P X = B'C" -U X P X 

BP X -U X C B>P>-U x C r 

This form of cross ratio is known as the canonical cross 
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Random Error to Non-privileged Points Trials 

Figure 10: Random noise added to non-privileged image points, over all views, for 10 trials. Average pixel error 
fluctuates around 0.5 pixels. The result of re-projection on a typical trial with average error of 0.52 pixels, and 
maximal error of 1.61 pixels. 
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Figure 11: The geometry underlying plane projectivity 
from four points. 



ratio. In general there are 24 cross ratios, six of which are 
numerically different (see Appendix B for more details 
on cross-ratios). Similarly, the cross ratio along the y- 
axis of the reference frame is equal to the cross ratio of 
the corresponding points in both views. 

Therefore, for any point p' in the first view, we con- 
struct its x and y locations, p' x ,p' along B'C" and B' A' , 
respectively. From the equality of cross ratios we find 
the locations of p'l,p'' and that leads to p" . Because 
we have used only projective constructions, i.e. straight 
lines project to straight lines, we are guaranteed that p' 
and p" are corresponding points. 



Algebraic Derivation 

From an algebraic point of view it is convenient to view 
points as laying on rays emanating from the center of 
projection. A ray representation is also called the homo- 
geneous coordinates representation of the plane, and is 
achieved by adding a third coordinate. Two vectors rep- 
resent the same point X = (x, y, z) if they differ at most 
by a scale factor (different locations along the same ray). 
A key result, which makes this representation amenable 
to application of linear algebra to geometry, is described 
in the following proposition: 

Proposition 5 A projectivity of the plane is equivalent 
to a linear transformation of the homogeneous represen- 
tation. 

The proof is omitted here, and can be found in Tuller 
(1967, Theorems 5.22, 5.24). A projectivity is equiv- 
alent, therefore, to a linear transformation applied to 
the rays. Because the correspondence between points 
and coordinates is not one-to-one, we have to take scalar 
factors of proportionality into account when represent- 
ing a projective transformation. An arbitrary projective 
transformation of the plane can be represented as a non- 
singular linear transformation (also called collmeation) 
pX' = TX, where p is an arbitrary scale factor. 

Given four corresponding rays pj = (xj,yj,l) < > 

p'- = (x'j,y'j, 1), we would like to find a linear transfor- 
mation T and the scalars pj such that pjp'- = Tpj. Note 
that because only ratios are involved, we can set p 4 = 1. 
The following are a basic lemma and theorem adapted 
from Semple and Kneebone (1952). 

Lemma 1 Ifpi, ...,P4 are four vectors in R 3 , no three of 
which are linearly dependent, and if e\, ...,e4 are respec- 
tively the vectors (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 1), there 
exists a non-singular linear transformation A such that 
Aej = XjPj, where the Xj are non-zero scalars; and the 
matrices of any two transformations with this property 
fer at most by a scalar factor. 
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Proof: Let pj have the components (xj , j/j , 1), and with- 
out loss of generality let A4 = 1 . The matrix A satisfies 
three conditions Aej = XjPj, j = 1,2,3 if and only if 
XjPj is the j'th column of A. Because of the fourth con- 



dition, the values Ai, A2, A3 satisfy 

/ Xi \ 

[Pl,P2,P3] X 2 ) = P4 

\x 3 J 

and since, by hypothesis of linear independence of 
Pi,P2,P3, the matrix \pi,P2,P3~\ is non-singular, the Xj 
are uniquely determined and non-zero. The matrix A is 
therefore determined up to a scalar factor. [1 

Theorem 4 If p\, ...,p^ and p' x ,...,p' A are two sets of 
four vectors in R 3 , no three vectors in either set be- 
ing linearly dependent, there exists a non-singular linear 
transformation T such that Tpj = pjp'- (j = 1,...,4), 
where the pj are scalars; and the matrix T is uniquely 
determined apart from a scalar factor. 

Proof: By the lemma, we can solve for A and Xj that 
satisfy Aej = XjPj (j = 1,...,4), and similarly we can 
choose B and pj to satisfy Bej = Pjp'j ; and without loss 
of generality assume that A4 = p^ = 1. We then have, 
T = BA~ l and pj = j 1 -. If, further, Tpj = pjp'j and 

Upj = Tjp'j, then TAej = pjXjp'- and UAcj = <jjXjp' : ; 
and therefore, by the lemma, TA = tIIA, i.e., T = tII 
for some scalar r. [1 

The immediate implication of the theorem is that one 
can solve directly for T and pj (p^ = 1). Four points 
provide twelve equations and we have twelve unknowns 
(nine for T and three for pf). Furthermore, because the 
system is linear, one can look for a least squares solu- 
tion by using more than four corresponding points (they 
all have to be coplanar): each additional point provides 
three more equations and one more unknown (the p as- 
sociated with it). 

Alternatively, one can eliminate pj from the equations, 
set T33 = 1 and set up directly a system of eight lin- 
ear equations as follows. In general we have four cor- 
responding rays pj = (xj,yj,Zj) < > p'j = (x'j,y' j; z'A, 

j = 1,...,4, and the linear transformation T satisfies 
Pjp', = Tpj. By eliminating pj , each pair of correspond- 
ing rays contributes the following two linear equations: 

Xjtl.l + Vjtl,2 + Zjti 3 M-^3,1 V^3,2 = -M- 

z- z- z- 

4 , , , , x i y U y i y U z i y 'i 

Xjh,l + Vjh,2 + Zjt 2l 3 7^*3,1 7^3,2 = J- 

z- z- z- 

A similar pair of equations can be derived in the case 
z'j = (ideal points) by using either x'j or j/'- (all three 
cannot be zero). 

Projectivity Between Two image Planes of an 
Uncalibrated Camera 

We can use the fundamental theorem of plane pro- 
jectivity to recover the projective transformation that 
was illustrated geometrically in Figure 11. Given four 
corresponding points (xj,yj) < > (x'j,y'j) that are pro- 
jected from four coplanar points in space we would like 
to find the projective transformation A that accounts 

for all other correspondences (x, y) < > (x' , y') that are 

projected from coplanar points in space. 
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Figure 12: Setting a projectivity under parallel projec- 
tion. 



The standard way to proceed is to assume that both 
image planes are parallel to their xy plane with a focal 
length of one unit, or in other words to embed the im- 
age coordinates in a 3D vector whose third component 
is 1. Let pj = (xj, yj, 1) and p'j = (x'j, y'j, 1) be the the 
chosen representation of image points. The true coordi- 
nates of those image points may be different (if the image 
plane are in different positions than assumed), but the 
main point is that all such representations are projec- 
tively equivalent to each other. Therefore, pjPj = Bpj 
and pjp'j = Cp'j , where pj and p'j are the true image 
coordinates of these points. If T is the projective trans- 
formation determined by the four corresponding points 
pj < > p'j, then A = CTB~ l is the projective transfor- 
mation between the assumed representations pj < > p'j . 

Therefore, the matrix A can be solved for directly 

from the correspondences pj < > p'j (the system of 

eight equations detailed in the previous section). For 
any given point p = (x,y,l), the corresponding point 
p' = (x' , y' , 1) is determined by Ap followed by normal- 
ization to set the third component back to 1. 

A.l Plane Projectivity in Affine Geometry 

In parallel projection we can take advantage of the fact 
that parallel lines project to parallel lines. This allows to 
define coordinates on the plane by subtending lines par- 
allel to the axes (see Figure 12). Note also that the two 
trapezoids BB'p' x p x and BB'C'C are similar trapezoids, 
therefore, 

BC _ B'C 

P^C'KC 7 ' 

This provides a geometric derivation of the result that 
three points are sufficient to set up a projectivity be- 
tween any two planes under parallel projection. 

Algebraically, a projectivity of the plane can be 




Figure 13: The cross-ratio of four distinct concurrent 
rays is equal to the cross-ratio of the four distinct points 
that result from intersecting the rays by a transversal. 



uniquely represented as a 2D affine transformation of the 
non-homogeneous coordinates of the points. Namely, if 
p = (x, y) and p' = (x' , y') are two corresponding points, 
then 

p' = Ap + w 

where A is a non-singular matrix and w is a vector. The 
six parameters of the transformation can be recovered 
from two non-collinear sets of three points, p ,Pi,P2 an d 

P'o,Pl>P2- Let 



A 



y'l -2/0 ,2/2 ■ 



2/0 



X\ 

2/1 



^ o ■> t * j 2 

■ 2/0,2/2 - 



X G 
Vo 



and w = p' — Ap , which together satisfy p'- — p' = 
A(pj — p ) for j = 1,2. For any arbitrary point p on 
the plane, we have that p is spanned by the two vectors 
P1-P0 and p 2 -p , i.e.,p = «i(pi -p ) + a 2 (P2 ~Po); and 
because translation in depth is lost in parallel projection, 
we have that p' = otiip'i ~ P'o) + a 2(P2~ p'o)i an d therefore 
P' -P'o = Mp-Po)- 

B Cross-Ratio and the Linear 
Combination of Rays 

The cross-ratio of four collinear points A, B, C, D is pre- 
served under central projection and is defined as: 



AB 

AC 



DB 
~DC 



A'B 1 D'B 1 



A'O ' D'C 



(see Figure 13). All permutations of the four points 
are allowed, and in general there are six distinct cross- 
ratios that can be computed from four collinear points. 
Because the cross-ratio is invariant to projection, any 
transversal meeting four distinct concurrent rays in four 
distinct points will have the same cross ratio — therefore 
one can speak of the cross-ratio of rays (concurrent or 
parallel) a, b, c, d. 

The cross-ratio result in terms of rays, rather than 
points, is appealing for the reasons that it enables the ap- 
plication of linear algebra (rays are represented as points 
in homogeneous coordinates), and more important, en- 
ables us to treat ideal points as any other point (critical 
for having an algebraic system that is well defined under 
both central and parallel projection). 



The cross-ratio of rays is computed algebraically 
through linear combination of points in homogeneous 
coordinates (see Gans 1969, pp. 291-295), as follows. 
Let the the rays a, b, c, d be represented by vectors 
(ai, a2, as), ..., (g?i, g?2, ds), respectively. We can repre- 
sent the rays a, d as a linear combination of the rays 
b, c, by 

a = b + kc 



d 



k'c 



For example, k can be found by solving the linear system 
of three equation pa = b + kc with two unknowns p, k 
(one can solve using any two of the three equations, or 
find a least squares solution using all three equations). 
We shall assume, first, that the points are Euclidean. 
The ratio in which A divides the line BC can be derived 
by: 



SlL 
a 3 



h, 

b 3 



AB 



AC iL- a- 

0.3 C 3 



bi -\-kci 


i^ 


b 3 +kc 3 


b 3 


bi -\-kci 


Cl 


b 3 +kc 3 


c 3 



k b 3 



DB_ 
~>C 



b 3 
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Similarly, we have -^ 

ratio of the four rays is a = -p . The same result holds 
under more general conditions, i.e., points can be ideal 
as well: 

Proposition 6 // A,B,C,D are distinct collinear 
points, with homogeneous coordinates b+kc, b, c, b+k'c, 
then the canonical cross-ratio is -p. 

(for a complete proof, see Gans 1969, pp. 294-295). For 
our purposes it is sufficient to consider the case when 
one of the points, say the vector d, is ideal (i.e. ds = 0). 
From the vector equation pd = b + k'c, we have that 
k' = — - and, therefore, the ratio =?S; = 1. As a result. 

c 3 ' ' VC 

the cross-ratio is determined only by the first term, i.e., 
a = -j|t = k — which is what we would expect if we 
represented points in the Euclidean plane and allowed 
the point D to extend to infinity along the line A, B , C , D 
(see Figure 13). 

The derivation so far can be translated directly to our 
purposes of computing the projective shape constant by 
replacing a, b, c, d with p' ,p' ,p' , V\, respectively. 

C On Epipolar Transformations 

Proposition 7 The epipolar lines pV r and p'Vi are per- 
spectively related. 

Proof: Consider Figure 14. We have already estab- 
lished that p projects onto the left epipolar line p'Vi. 
By definition, the right epipole V r projects onto the left 
epipole V, therefore, because lines are projective invari- 
ants the line pV r projects onto the line p'Vi. U 

The result that epipolar lines in one image are per- 
spectively related to the epipolar lines in the other im- 
age, implies that there exists a projective transformation 
F that maps epipolar lines lj onto epipolar lines /'• , that 
is Flj = Pjl'j, where lj = pj x V r and /'• = p'- x V;. From 
the property of point/line duality of projective geome- 
try (Semple and Kneebone, 1952), the transformation 
E that maps points on left epipolar lines onto points on 
the corresponding right epipolar lines is induced from F , 
i.e., E= (F- 1 )*. 



reference 
plane 



image 
plane 




Figure 14: Epipolar lines are perspectively related. 

Proposition 8 (point/line duality) The 

transformation for projecting p onto the left epipolar line 

p'V, is E = (F- V y. 

Proof: Let /, /' be corresponding epipolar lines, related 
by the equation pi' = Fl. Let p,p' be any two points, 
one on each epipolar line (not necessarily corresponding 
points). From the point/line incidence axiom we have 
that /* • p = 0. By substituting / we have 



[pf-'i'Y 



p 







pf ■ [F~ 



P\ 



0. 



Therefore, the collineation E = (F~ 1 ) t maps points p 
onto the corresponding left epipolar line. U 

It is intuitively clear that the epipolar line transforma- 
tion F is not unique, and therefore the induced trans- 
formation E is not unique either. The correspondence 
between the epipolar lines is not disturbed under trans- 
lation along the line V\V'i, or under non-rigid camera 
motion that results from tilting the image plane with re- 
spect to the optical axis such that the epipole remains 
on the line V\V'i- 

Proposition 9 The epipolar transformation F is not 
unique. 

Proof: A projective transformation is determined 
by four corresponding pencils. The transformation is 
unique (up to a scale factor) if no three of the pencils are 
linearly dependent, i.e., if the pencils are lines, then no 
three of the four lines should be coplanar. The epipolar 
line transformation F can be determined by the corre- 
sponding epipoles, V r < > V, and three corresponding 

epipolar lines lj < > /'• . We show next that the epipolar 

lines are coplanar, and therefore, F cannot be deter- 
mined uniquely. 

Let pj and p'-, j = 1,2,3, be three corresponding 
points and let lj = pj x V r and /'• = p'- x V\. Let 
p 3 = api + (3p2, a + /? = 1, be a point on the epipo- 
lar line psVr collinear with pi,P2- We have, 



I3 = P3 x V r = (ap3 + bV r ) x V r = ap3 x V r 
and similarly 1' 3 = a'l^ + l3'l' 2 . U 



aali + a/3/2, 
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Figure 15: See text. 

The epipolar transformation, therefore, has three free 
parameters (one for scale, the other two because the 
equation FI3 = ^3/3 has dropped out). 

D Affine Structure in Projective Space 

Proposition 10 The affine structure invariant, based 
on a single reference plane and a reference point, cannot 
be directly extended to central projection. 

Proof: Consider the drawing in Figure 15. Let Q be 
the reference point, P be an arbitrary point of interest 
in space, and Q,P be the projection of Q and P onto 
the reference plane (see section 4 for definition of affine 
structure under parallel projection). 

The relationship between the points P,Q,P,Q and 
the points p' ,p' , q' , q' can be described as a perspectivity 
between two triangles. However, in order to establish 
an invariant between the two triangles one must have a 
coplanar point outside each of the triangles, therefore the 
five corresponding points are not sufficient for determin- 
ing an invariant relation (this is known as the 'five point 
invariant' which requires that no three of the points be 
collinear) .[] 

E On the Intersection of Epipolar Lines 

Barret et al. (1991) derive a quadratic invariant based 
on Longuet-Higgins' fundamental matrix. We describe 
briefly their invariant and show that it is equivalent to 
performing re-projection using intersection of epipolar 
lines. 

In section 8 we derived Longuet-Higgins' fundamental 
matrix relation p' Hp = 0. Barret et al. note that the 
equation can be written in vector form h ■ q = 0, where 
h contains the elements of H and 

q = (x'x, x'y, x', y'x, y'y, y',x,y, 1). 



Therefore, the matrix 



B 



Qi 
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(1) 



must have a vanishing determinant. Given eight corre- 
sponding points, the condition |5| = leads to a con- 
straint line in terms of the coordinates of any ninth point, 
i.e., ax + f3y + 7 = 0. The location of the ninth point 
in any third view can, therefore, be determined by inter- 
secting the constraint lines derived from views 1 and 3, 
and views 2 and 3. 

Another way of deriving this re-projection method is 
by first noticing that H is a correlation that maps p onto 
the corresponding epipolar line /' = V\ xp' (see section 8). 
Therefore, from views 1 and 3 we have the relation 



P 



'*Hp 



0, 



and from views 2 and 3 we have the relation 
p nt Ep' = 0, 

where Hp and Hp' are two intersecting epipolar lines. 
Given eight corresponding points, we can recover H and 
H . The location of any ninth point p" can be recovered 
by intersecting the lines Hp and Hp' . 

This way of deriving the re-projection method has an 
advantage over using the condition |5| = directly, 
because one can use more than eight points in a least 
squares solution (via SVD) for the matrices H and H . 

Approaching the re-projection problem using intersec- 
tion of epipolar lines is problematic for novel views that 
have a similar epipolar geometry to that of the two model 
views (these are situations where the two lines Hp and 
Hp' are nearly parallel, such as when the object rotates 
around nearly the same axis for all views). We there- 
fore expect sensitivity to errors also under conditions of 
small separation between views. The method becomes 
more practical if one uses multiple model views instead 
of only two, because each model views adds one epipolar 
line and all lines should intersect at the location of the 
point of interest in the novel view. 
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